Speech quality estimation with deep lattice networks

نویسندگان

چکیده

Intrusive subjective speech quality estimation of mean opinion score (MOS) often involves mapping a raw similarity extracted from differences between the clean and degraded utterance onto MOS with fitted function. More recent models such as support vector regression (SVR) or deep neural networks use multidimensional input, which allows for more accurate prediction than one-dimensional (1-D) mappings but does not provide monotonic property that is expected quality. We investigate function using lattice (DLNs) to constraints input features provided by ViSQOL. The DLN improved 0.24 mean-square error on mixture datasets include voice over IP codec degradations, outperforming 1-D functions SVR well PESQ POLQA. Additionally, we show can be used learn quantile well-calibrated useful measure uncertainty. provides an data driven representations human interpretable scales, intervals predictions instead point estimates.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Speech Recognition with Deep Neural Networks for Impaired Speech

Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM mode...

متن کامل

Robust speech recognition with speech enhanced deep neural networks

We propose a signal pre-processing front-end to enhance speech based on deep neural networks (DNNs) and use the enhanced speech features directly to train hidden Markov models (HMMs) for robust speech recognition. As a comprehensive study, we examine its effectiveness for different acoustic features, acoustic models, and training-testing combinations. Tested on the Aurora4 task the experimental...

متن کامل

Quality Estimation of Alaryngeal Speech

Abstract— Quality assessment can be done using subjective listening tests or using objective quality measures. Objective measures quantify quality. The sentence material is chosen from IEEE corpus. Real world noise data was taken from the noisy speech corpus NOIZEUS. Alaryngeal speaker‘s voice (alaryngeal speech) is recorded. To enhance the quality of speech produced from the prosthetic device,...

متن کامل

Estimating nonnegative matrix model activations with deep neural networks to increase perceptual speech quality.

As a means of speech separation, time-frequency masking applies a gain function to the time-frequency representation of noisy speech. On the other hand, nonnegative matrix factorization (NMF) addresses separation by linearly combining basis vectors from speech and noise models to approximate noisy speech. This paper presents an approach for improving the perceptual quality of speech separated f...

متن کامل

Deep Lattice Networks and Partial Monotonic Functions

We propose learning deep models that are monotonic with respect to a userspecified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network. We implement the layers and projections with new computational graph nodes in TensorFlow and use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the Acoustical Society of America

سال: 2021

ISSN: ['0001-4966', '1520-9024', '1520-8524']

DOI: https://doi.org/10.1121/10.0005130